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Objectives: The aim of the study was to develop a metadata and ontology-based health information search engine ensuring 
semantic interoperability to collect and provide health information using different application programs. Methods: Health 
information metadata ontology was developed using a distributed semantic Web content publishing model based on vo- 
cabularies used to index the contents generated by the information producers as well as those used to search the contents by 
the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical 
Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed 
by adding an element describing the target audience to the Dublin Core Metadata Element Set. Results: A metadata schema 
and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and 
ontology-based health information search engine developed in this study produced a better search result compared to exist- 
ing search engines. Conclusions: Health information search engine based on metadata and ontology will provide reliable 
health information to both information producer and information consumers. 
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I. Introduction 

As health information provided on the Internet is created by 
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various institutions and individuals with varying degrees of 
credibility and its target audiences range from the general 
public to healthcare professionals, the quality and type of 
information provided also vary widely. In such an environ- 
ment, it is not easy for ordinary people to find reliable in- 
formation relevant to their health problems on the Internet. 
To deal with this issue, health portals, websites that acts as 
gateways to other sites and aggregate content from multiple 
resources and present it to the consumers, have emerged. 
Types of Internet portals include service portals that gather 
and present various services, community portals that pro- 
vide virtual meeting places, and information portals that act 
as data hubs by collecting and presenting data for users. 

Health Park (http://www.healthpark.or.kr/) is a health 
information portal maintained by the Korea Institute for 
Health and Social Affairs to provide reliable health informa- 
tion to Korean citizens. It presents search results on health 
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information for the promotion and management of Korean 
citizens' health under cooperative partnership with various 
health information providers. The search results presented 
by Health Park include basic search results collected from 
health-related websites operated by partner institutions and 
classified by healthcare specialists as well as search results 
automatically collected and classified by a robot program. 
Currently, Health Park allows its users to search health and 
wellness information according to body locations and de- 
mographic groups, such as age and sex, as well as in Korean 
alphabetical order. However, the search capabilities of the 
Health Park are quite limited compared to those of Medline- 
Plus, the health information portal operated by the National 
Library of Medicine in the United States. 

In general, when information is published on the Web, con- 
tent producers create Web pages individually and then link 
them together. Information portals, such as Health Park, col- 
lect and integrate information produced by others and then 
offer the information to users via link services. Link service 
to the relevant websites on an information portal is automat- 
ically processed by computers, mainly using search engines. 

Existing health information portals that collect and pres- 
ent health information for the general public pose various 
problems for the information users and producers. From the 
users' perspective, they need to have sufficient knowledge 
about the institutions that provide content in order to search 
for relevant content. Meanwhile, from the information pro- 
ducers' perspective, it is difficult and time-consuming to find 
suitable keywords or metadata to describe the Web pages 
and documents they produce. 

One way to solve these problems is for information portals 
to use the distributed content generation model that ensures 
semantic interoperability. By utilizing this model, it is pos- 
sible to collect and reuse information produced by numerous 
content producers. Semantic interoperability can be achieved 
by using a reference terminology model, a standardized 
clinical terminology system, and knowledge modeling, such 
as metadata that connects the two [1]. 

Metadata is structured information that describes, explains, 
locates, or otherwise makes it easier to retrieve, use, or 
manage an information resource [2]. Descriptive metadata 
describes a resource for purposes, such as discovery and 
identification. It can include elements, such as title, author, 
abstract, and keywords. Administrative metadata provides 
information to help manage a resource, such as when and 
how it was created, file type and other technical informa- 
tion, and who can access it. Descriptive metadata can be 
created by domain experts who produce information using 
data standards, such as the Systematized Nomenclature of 
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Medicine Clinical Terms (SNOMED CT) [3], International 
Classification of Diseases (ICD) [4], Current Procedural Ter- 
minology (CPT) [5], LOINC [6], and HL7 [7]. Administra- 
tive metadata can be created by IT specialists who manage 
automatically collected data using information tools. 

The semantic interoperability of a health information search 
engine can be achieved by using a common ontology to fill 
values of metadata schema [8]. In the field of health and medi- 
cal care, ontology has been used in the management of genetic 
information [9], knowledge base building for medical diagno- 
ses [10], search and classification of disease information [11], 
and description of nursing processes [12-14]. 

With this background, this study aimed to develop a seman- 
tic interoperability ensured health information search engine 
that efficiently collects health information generated by con- 
tent producers using a distributed semantic Web content pub- 
lishing model (Semantic Web, http://www.w3.org) [15] based 
on a metadata and terminology system and to supply health 
information using various application programs. 

II. Methods 

1. Study Framework 

The distributed content creation model was used for the de- 
velopment and reuse of health information in this study. The 
use of this model was justified as follows. First, this model 
enables a Web application to reuse health-related content 
produced by another institution, minimizing redundant 
work and the cost of content production. Second, this model 
enables a computer system to automatically maintain se- 
mantic links and gather content produced by various content 
producers, minimizing the maintenance cost of a portal. 
Since content in a certain subject domain can be created at 
any time, the system should be able to search and upload the 
new information on the portal as well as update links to rel- 
evant information automatically. Third, the model can pro- 
vide an intelligent service which enables users to search in- 
formation they want from their own perspectives and search 
content using semantic relationships. To achieve this, ter- 
minology used in the user interface should comprise terms 
used by the general public, different from the terminology 
used by the content providers for indexing the content. 

The distributed content creation model is outlined in Fig- 
ure 1. Content producers shown on the left side of the figure 
develop and provide information, such as Web pages and 
documents. This content is indexed using metadata schema 
and ontology so that others can use it as well. Then, the in- 
dexed content is stored in the knowledge base for reuse in 
the secondary application. 
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In this study, we aimed to build metadata schema and 
ontology that can be used for indexing information and ap- 
ply them to the health information search engine of Health 
Park. If content collected by Health Park is indexed using 
metadata and ontology and stored, semantic interoperability 
of information can be ensured. It would then be possible to 
share, exchange, and reuse the information. 

2. Development of Metadata Schema 

Web documents to be searched on a health information 
portal should be described in a format that can be under- 
stood by a machine. The metadata schema defines fields 
(properties) required for presenting information about an 
individual document. The values of metadata fields are filled 
with text (e.g., document title) comprehensible to humans, 
such as structured text strings (e.g., date of publication) or 
pre-defined ontology concepts (e.g., subject heading). Some 
of these fields are required fields, and some are recorded 
more than once. The metadata schema not only provides the 
format specifications that show the optionality of each field 
in describing individual documents, but also helps content 
producers create a suitable content and use it as an interface 
to test the validity of the content before publishing it. 

The metadata schema of the health information search 
engine developed in this study was based on the Dublin 
Core Metadata Element Set [16]. Additional elements were 
identified and added to describe the health information 
documents in more detail. We specified mandatory elements 
among these metadata elements. 

3. Development of Ontology 

Ontologies to be used for the health information search 
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engine included types of medium of resources, target audi- 
ence, and subject heading. The ontology for the types of 
medium used to classify the types of health information was 
developed using the Type Vocabulary of the Dublin Core 
Metadata Initiative (DCMI) [17], and the ontology for the 
target audience was developed with consideration of the 
concepts used to differentiate health information that is sup- 
plied only for specific audience groups. The ontology for the 
subject heading used to describe the themes of Web content 
was developed to include the symptoms of health problems, 
diagnostic methods, treatment methods, and prevention 
methods. 

The vocabulary used to describe the subject heading ontol- 
ogy consists of concepts classifying the content of health 
information into disease information, health promotion/dis- 
ease prevention information, and diagnosis/ treatment man- 
agement information. We collected and analyzed vocabulary 
describing the 'Title/Contents' under the 'Health and Disease 
Information-Disease/Surgery/Treatment/Diagnostic Tests 
Information menu of 'Health iN (http://hi.nhic.or.kr)' portal 
site operated by the National Health Insurance Cooperation, 
a partner organization of Health Park. After the morpheme 
analysis of the collected titles and content, a health informa- 
tion classifier was developed by comparing the analyzed data 
with the ontology metadata database. This classifier handles 
analogous terms and synonyms used by the general public. 

4. Development of Search Engine 

Commercial search engines, such as Repia External Knowl- 
edge Management System (REKMS-3.0) and Repia Search 
Appliance for website (RSA-3.5) were used as the main 
solutions for the search engines. Programming language C 



Content Primary Content Secondary End- 

creation application harvest application users 




> trf^ Figure 1. Outline of generated health 
Portal 2 content and process reuse. 
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was used for search engine coding, and JSP was used for the 
user interface design. Red Hat Enterprise Linux 5.5 (32 bits), 
4 GB memory card, SAS 300 GB hard disk drive, MySQL 
Database ver. 5.0.95, Apache HTTP Server ver. 2.2.3, and 
Apache Tomcat 6.0.18 were used for the server. 

The performance of the search engine was tested through a 
comparison of the differences between the search results of 
the existing system and those of the newly developed search 



system. 

III. Results 

The ontology infrastructure of health information search 
engines consists of a metadata schema and an ontology vo- 
cabulary. The metadata schema defines elements to describe 
Web documents and the types of values the elements can 



Table 1. Metadata schema of health information 



Category Term name Optionality Cardinality Value type Value range Definition 



General 


Identifier 


Mandatory 


1 


URI 




Reference to resource 


metadata 


Title 


Mandatory 


1 


Free text 


Non-empty string 


Assignment of label to 

resource; 
Resource is known by official 

name 




Description 


Optional 


0..n 


Free text 


Non-empty string 


Explanation for resources 
(abstract, contents, graphical 
representation, etc.) 




Contributor 


Optional 


0..n 


Instance 


Personal, organization and 
services 


Creation of the resources 
object 




Language 


Optional 


0..n 


String 


RFC4546 


Developed language to 
resource 




Date 


Optional 


0..n 


String 


W3CDTF (ISO 8601) 


The life cycle of the resource 
characteristics, event-related 
timeline 




Publisher 


Optional 


0..n 


Instance 


Personnel, group and services 


The primary responsibility for 
the creation of the resource 
object 




Creator 


Optional 


0..n 


Instance 


Personnel, organization and 
services 


The primary responsibility for 
the creation of the resource 
object 


Content 


Subject 


Mandatory 


l..n 


Concept 


Keywords or classification 


Subject of resources 


classification 










system 






Target 


Optional 


0..n 


Concept 


Gender, age, and check for 


Target audience of resources 




audience 








disease 






Type 


Mandatory 


l..n 


Concept 


DCMI Type Vocabulary" 1 


Type of resources 




Format 


Optional 


0..n 


String 


Internet Media Type b 


File format of resources; 
Medium and domain (size and 
period) 




Coverage 


Optional 


0..n 


String or 
concept 


Thesaurus of geographic 
names 


Resources-related space or 
temporal information 


Relation 


Right 


Optional 


0..n 


Free text 


URI or text skill 


Intellectual property right 




Source 


Optional 


0..n 


Free text 


URI or reference 


The history of reference 
resources 




Relation 


Optional 


0..n 


Free text 


URI or reference 


Related resources 



'Collection, dataset, event, image, interactive resource, moving image, physical object, service, software, sound, still image, and text. 
b Application, audio, example, image, message, model, multipart, text, and video. 
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take. The metadata schema can be utilized not only in the 
extraction and classification of existing health information, 
but also in the production of health information content by 
individuals or institutions to guarantee syntactic interoper- 
ability of content. The ontology vocabulary system consists 
of concepts to be used to fill values of metadata schema. If 
individuals or institutions that develop content use the on- 
tology vocabulary system, the semantic interoperability of 
content will be ensured. 

1. Health Information Metadata Schema 

All elements of the Dublin Core Metadata Element Set 
(http://dublincore.org/documents/dces/) were used as ele- 
ments of the metadata schema of the health information 
search engine in this study. They included the following: 
identifier, title, description, contributor, language, date, pub- 
lisher, and creator under the General Metadata category; 
subject, type, format, and coverage under the Content Clas- 
sification category; and right, source, and relation under the 
Relation category. In addition, the target audience was added 
to the Content Classification category to describe who the 
main target audiences of the health information documents 
were (Table 1). We specified identifier and title of General 
Metadata and subject and type of Content Classification as 
mandatory elements among these 16 metadata elements. 

2. Ontologies 

The ontologies to be used for the health information search 
engine included the medium types of resources, target audi- 
ence, and description of the subject heading. 

The ontology for the medium type to classify the types of 
health information was extracted from the DCMI Type Vo- 
cabulary [17]. They included data set, still image, moving 
image, interactive resource, text, and collection of several 
types of medium (Table 2). The ontology for the target audi- 
ence included the concepts used for differentiating health 
information supplied only for special audience groups. They 
included gender, age, and disease condition (Table 2). 

The subject heading ontology was composed of the symp- 
toms of health problems, diagnostic methods, treatment 
methods, and prevention methods (Figure 2). A given health 
problem involves one or more symptoms, diagnostic meth- 
ods, treatment methods, and prevention methods, all of 
which can be presented according to various aspects, such as 
relevant body parts, disease status, gender, and age. A spe- 
cific symptom involves one or more symptom management 
methods, and a treatment involves various treatment meth- 
ods, such as medication, operation, and complementary and 
alternative therapy. A prevention method can also involve 
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different types of health promotion and disease protection 
methods. 

Concerning the cardinality of their relationships, the re- 
lationship between a health problem and a symptom is l:n, 
the relationship between a health problem and a diagnosis 
method l:n, the relationship between a health problem and 
a therapy method l:n, the relationship between a health 
problem and a prevention method l:n, the relationship 
between a health problem and body parts 0:n, the relation- 
ship between a health problem and a disease status 0:n, the 
relationship between a health problem and gender 0:n, the 
relationship between a health problem and age 0:n, the rela- 
tionship between a symptom and symptom management l:n, 
the relationship between a therapy method and a medication 
0:n, the relationship between a therapy method and an op- 
eration 0:n, the relationship between a therapy method and 
a complementary/alternative therapy 0:n, the relationship 
between a preventive method and health promotion 0:n, and 
the relationship between a preventive method and a disease 



Table 2. Ontologies for type of medium and target audience 



Type of medium 


Function 


Data set 


Table and data consisted of database 


Still image 


Still visual material 


Moving image 


Video 


Interactive 


Data requesting users interaction. 


resource 




Text 


Data consisted of string 


Collection 


Data set of several types of medium 


Target audience 


Content 



Gender Male 
Female 

Age Child: from zero years to under 20 years old 

Newborn 

Infant or toddler, 0 to 3 years 

- Infant: under one year old. 

- Toddler: from one to under three years old 
Preschooler: from three to under seven 

years old 

School children: from seven to under 

13 years old 
Adolescent: from 13 to under 20 years old 
Young adult: from 20 to under 40 years old 
Middle age: from 40 to under 60 years old 
Elderly: Over 60 
Disease Healthy people 

condition Patient 
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protection method 0:n. 

The vocabulary system used to describe the subject head- 
ing consists of concepts classifying the content of health 
information. They can be grouped into disease information, 
health promotion/disease prevention information, and di- 



agnosis/treatment management information. Disease infor- 
mation is presented in terms of relevant body parts, disease 
status, gender, and age groups. Health promotion/ disease 
prevention information incorporates concepts describing ac- 
tions an individual can take to maintain health and prevent 



Health 




Disease 


promotion 




protection 



Symptom 
management 



Symptom 



Diagnosis 
methods 



Medication 



Prevention 
method 



Health 
problem 




Therapy 
method 



Operation 




Health problem 
in body parts 



Health problem 
in disease status 



Health problem 
in gender 



Health problem 
in age 



Complementary and 
alternative therapy 



Figure 2. Ontology for health infor- 
mation subject of heading. 



Table 3. Vocabulary system for ontology and number of subjects 



Level 1 


Level 2 


No. of subject 
headings 


Health problems 


1. The circulatory system: heart, artery, veins, capillary- related health problems 


40 


by body location/ 


2. Blood system: blood and lymphoid-related health problems 


27 


systems (491) 


3. Musculoskeletal system: muscles, bone, joints and ligaments-related health problems 


67 




4. Brain and nerves system: brain, spinal cord and nerves-related health problems 


77 




5. Digestive system: esophagus, stomach, small intestine, large intestine, liver, gallbladder, bili- 


54 




ary tract and pancreas-related health problems 






6. Ear, nose and throat: ear, hearing, nose, and sense of smell-related health symptoms 


33 




7. Endocrine system: adrenal gland, thyroid, testicles, ovary, pituitary gland-related health 


20 




problems 






8. Ophthalmology: eye, eyesight-related health symptoms 


21 




9. Immune system: bone marrow, lymph nodes, spleen, tonsil and thymus-related health prob- 


11 




lems 






10. Urinary system: kidney, urethra, bladder, urethra-related health problems 


22 




11. Respiratory system: nasal cavity, laryngopharynx, trachea, bronchus, lung and pleura-related 


35 




health problems 






12. Oral cavity: lip, tooth, gum, soft palate, hard palate, tongue and uvula-related health prob- 


34 




lems 






13. Integumentary system: skin, nail and hair-related health problems 


22 




14. Genitourinary system: prostate, testicle, penis, ovary, breast, uterus, cervix and vagina-relat- 


21 




ed health problems 
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Table 3. Continued 






Level 1 


Level 2 


<"v T run i r>"f" 
nAQninnc 

ficduniys 


Disorders and 


1. Cancer 


4o 


conditions (533) 


2. Infections 


bo 




3. Metabolic problems 


1 1 

J J 




4. Diabetes mellitus 


22 




5. Genetic/birth defects 


7Q 

/y 




6. Injuries and wounds 


H \ 

74 




7 l\/T onto 1 nAoltn on/i nPnunrti 1 
/. TVlCllLdl llCdlLll dllU UClldvltJl 






O. iUlbUlllllg, LUAiUJlUgV dllLL CllV 11 UlllllCllLdl llCdlLll 






L 1 CtllldllCy dllLL ICUltJULlCLlUll 


U 1 




10. Substance abuse problems 


32 


T-Tph ltn nrriHlprn c 
i it til ill iJUJUiciiia 


1 IVfpnc fipalth 
i. iviciia llCdlLll 




bv pender ( 1 06^ 


2. Women's health 


82 


1 it dill 1 L/AUUlClllO 


1 ^rrilrirpri c npiiltn 
1. V^/llllU.1 Clio llCdlLll 




bv aee ( 1 54) 


2. Seniors' health 


64 


J-VldgllUala dllCl 


1. nCdlLll CAdlllllldLlOll 


zL7 


tnerapy {Zl / ) 


2. Symptoms 


4r> 




D. UL Ug LllCldpy 






/I QnvfTP'nr on/i who r\i 

OUlgCiy dllU IClldUlULdLlUll 


41 

'ii 




D. 11 dlldUldllLdLlUll dllCl LLOlldLlOll 


Zl 




f-\ I Amu UniAtiton; onn ■al+Pfno+iArP thAroi^ipc 
O. V_/UlllUlClllCllLdl y dllQ dlLCllldLlVC LllCldUICa 


o 
o 




7 T"-Tpilltn C\7"Ctpm 
/. iicdiLii aysLCiii 




Health and 


1. Food and nutrition 


7 1 
/ 1 


wellness (325) 


2. Health and fitness 


1U 




5. rieaitn naDits 


24 




4. Vaccination 


24 




5. Substance abuse 


25 




o. Ural neaitn 


1U 




/. oKin neaitn 


1 7 
1 / 




H VpviioI nAoltn 
O. OCALldl llCdlLll 


If, 

JO 




y. TVlCllLdl llCdlLll 


1 1 
1 1 




10. Disaster preparation and recovery 


2U 




11. odlciy laallCa 


33 

j j 




12. Individual right 


16 




13. Family and social issues 


25 



diseases as well as problems faced by family and society as 
a whole. Diagnosis/treatment information includes con- 
cepts describing diagnostic tests, symptom management, 
and various treatments required when a health problem oc- 
curs. The vocabulary system used in this study was mapped 
to SNOMED CT, and about 1,300 terms were proposed to 
describe the subject headings of the health information pro- 
vided on the Internet (Table 3). 



3. Development of Health Information Ontology Search 
System 

The diagram of the ontology-based health information search 
system developed in the study is shown in Figure 3. REKMS 
collects and extracts health information from webpages. The 
RSA system analyzes a search term entered by a user using 
mobile and Web applications, matches the analyzed concepts 
of the search term with health information extracted from 
the REKMS, and returns ontology-based search results using 
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the Onto DB developed with the metadata subject headings 
of health information mapped to the SNOMED CT concepts. 

A search begins in three different ways. A user can enter 
a search term into the search Window, select a keyword 
from the Onto Tree, or enter a search term into the search 
Window and also select a keyword from the Onto Tree. If 
the search Window is used, only search results of the search 
term will appear. If the Onto Tree is used, search results of 
predefined identifiers will appear. If both the search Window 
and the Onto Tree are used, search results of both the search 
term and predefined identifiers will appear. Onto Tree was 
developed using the metadata title and identifier of the Onto 
DB. 

REKMS collects and extracts health information from the 
webpages. WebSprider of the REKMS collects source web- 
pages with health information from the Health iN (http:// 
hi.nhic.or.kr) and Health Park. Once the webpage is down- 
loaded, the new source of the webpage in the downloaded 
webpage is extracted. This process will be repeated until 
there are no more sources of webpages to collect. The Rss- 
Filter of the REKMS extracts the titles and contents of the 
health information of the webpages after removing unneces- 
sary information and stores them in the database. 



Metadata and Ontology-Based Search Engine 

When a search term is entered, the QUERYSERVER will 
send the term to the DICSERVER. The DICSERVER will do 
a morpheme analysis of the search term and return identi- 
fiers of the analyzed concepts. RONTO (Sorter) stores the 
contents of the Onto DB into memory space to increase 
search capacity and relieve database loading. The health 
information of the webpages extracted with the REKMS is 
searched using the metadata of the RONTO and a priori 
algorithms. If there is a match, a metadata identifier is as- 
signed to the webpage and stored in the indexing file. One 
document from the webpage can have more than one meta- 
data identifier. 

Regarding the performance of the search engine, Figure 4 
shows the differences between the search results of the exist- 
ing system and those of the newly developed 'Ontology-based 
Health Information Search System.' With the existing system, 
there were a total of 150 search results when the search was 
made with the keyword 'cancer,' while there were 12 search 
results when the search was made with the keywords 'oral + 
cancer.' On the other hand, 4 search results were found when 
the keyword 'cancer' was entered and then 'oral' was selected 
in the Onto Tree. 

Content not closely related to oral cancer was presented 
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when the search was made using the keywords 'oral + can- 
cer' with the existing system, while search results for 'tongue 
cancer', which is closely related to oral cancer, appeared 
when the search was made with the newly developed system. 

IV. Discussion 

Health information on the Internet is provided in numer- 
ous forms by various types of institutions and individuals, 
including ordinary people and specialists. It is not easy for 
the ordinary person to find suitable and reliable information 
that is personally relevant under this environment. Health 
portals have been introduced to solve these problems. How- 
ever, these portals often fail to supply the appropriate infor- 
mation that users want. 

Existing health information portals that collect and pres- 
ent health information for the general public pose various 
problems for information users and information producers 
[18]. Information users experience various types of difficul- 
ties when they search for health information on the Internet, 
such as searching difficulties, connection problems, as well 
as issues regarding quality and level of specialization. Infor- 
mation producers also find difficulty in producing health 
information on the Internet, maintaining internal and exter- 



nal links, indexing, managing content quality, and ensuring 
interoperability. 

The greatest obstacle in managing health and medical 
information is that the processed information is neither 
structured nor indexed using controlled vocabularies; thus, 
it is difficult to extract relevant information. To overcome 
this problem, unstructured text-based content provided on 
the Internet is processed with a natural language processing 
application to extract concepts and map them to a standard 
terminology system, such as SNOMED CT. Then metadata 
of the concepts is defined and stored in the knowledge base 
for later use. By utilizing metadata, it is possible to find in- 
formation resources easily, to structure information based 
on target audience of the information and themes, and to 
ensure the interoperability required for information sharing. 

With this background, the development of a health infor- 
mation search engine based on metadata ontology is expect- 
ed to be a new solution that can provide both information 
producers and users with semantically interoperable health 
information. Various studies have aimed to develop health 
information ontology. In particular, the Ontology-based 
Health Information Inquiry System [19] was built as part 
of the National Health Promotion Information System to 
strengthen search capabilities using the health information 
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database system. However, it was limited to a methodology 
for developing the ontology and failed to actually be imple- 
mented into a service system. 

This study aimed to develop a metadata schema and ontol- 
ogy to ensure interoperability in exchanging and communi- 
cating health information provided on the Internet, and then 
developing a metadata schema and ontology-based semantic 
Web model and portal to create content, collect and provide 
health information, and implement them in the Health Park 
search engine. 

The metadata schema proposed in the study has 16 ele- 
ments, with the target audience element added to the 15 
metadata elements of the Dublin Core Metadata Element 
Set which is already widely known, to describe the main 
target audience of the health information. Among these 16 
metadata elements, identifier, title, subject, and type were 
proposed as mandatory elements. 

Also, the health information ontology was divided into 
health problem, prevention, symptom, diagnosis, and ther- 
apy information. These classes can then be divided further. 
For example, health problem was further classified by body 
location, disease status, gender, and age. The ontology vocab- 
ulary describing subject headings was mapped to SNOMED 
CT, and a list of about 1,300 terms was suggested to describe 
health information provided on the Internet. 

To develop the ontology-based search engine, first a data- 
base was built by collecting concepts describing the 'Title/ 
Contents' of the health and disease information from a 
health portal operated by the National Health Insurance Co- 
operation. After the morpheme analysis of the titles and con- 
tent collected, a health information classifier was developed 
through a comparison with the ontology metadata database. 
While the newly developed ontology search engine gener- 
ated one-third of the search results produced by the existing 
search engine, the search results were more accurate. The 
newly developed search engine was able to search a 'tongue 
cancer' as a type of 'oral cancer'. This was possible because 
the search keywords were mapped to the ontology with a hi- 
erarchical structure. 

Therefore, the health information search engine based on 
meta-ontology that was developed in this study is expected 
to be a reliable means of providing quality information to 
information producers and users. 
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